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Abstract 

Inspired by the problem of sensory coding in neuroscience, we study the maximum entropy distribution 
on weiglited graplis with a given expected degree sequence. This distribution on graphs is cliaracterized 
by independent edge weights parameterized by vertex potentials at each node. Using the general theory of 
exponential family distributions, we prove existence and uniqueness of the maximum likelihood estimator 
(MLE) of the vertex parameters. We also prove in several cases the surprising consistency of the MLE 
from a single graph sample, extending results of Chatterjee, Diaconis, and Sly for unweighted (binary) 
graphs. Interestingly, our extensions require an intricate study of the inverses of diagonally dominant 
positive matrices. Along the way, we derive analogues of the Erdos-Renyi criterion of graphic sequences 
for weighted graphs. 

Notation: Wc use the notation M+ = (0, oo), Rq = [0, oo), N = {1,2,...}, and No = {0, 1,2,...}. We 
write and Yl(i j) for the summation and product, respectively, over all (2) pairs with i ^ j. Given 

a subset C of M", we let C° and C denote the interior and closure of C in R". respectively. For a vector 
X = {xi, . . . ,Xn) G R"; we set ||a:||txj = maxi<,<,i \xi\ to be the £oo-norm of x. 

1 Introduction 

Maximum entropy models are an important class of statistical models for biology. For instance, they have 
been found recently to well- model protein folding j301 136[ , antibody diversity |25j , neural population activity 
[351 [Ml [551 [371 m [35] , and flock behavior |S]. Here, we develop a general framework for studying maximum 
entropy distributions on weighted graphs, extending recent work of Chatterjee, Diaconis, and Sly [8]. Our 
motivation for developing this theory comes from the problem of sensory coding in neuroscience. 

In the brain, information is represented by discrete electrical pulses, called action potentials or spikes |29j . 
This includes neural representations of sensory stimuli which can take on a continuum of values. For instance, 
large photoreceptor arrays in the retina respond to a range of light intensities in a visual environment, but 
the brain does not receive information from these photoreceptors directly. Instead, retinal ganglion cells must 
convey this detailed input to the visual cortex using only a series of binary electrical signals. Continuous 
stimuli arc therefore converted by networks of neurons to sequences of spike times. 

An unresolved controversy in neuroscience is whether information is contained in the precise timings of 
these spikes or only in their "rates" (i.e., counts of spikes in a window of time). Early theoretical studies 
|23| suggest that information capacities of timing-based codes are superior to those that are rate-based (also 
see [17| for an implementation in a simple model). Moreover, a number of scientific articles have appeared 
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suggesting that precise spike timing ]1] [3l [27l |40l [22l |6l [24j [26l [TT] [19] and synchrony [39] are important 
for various computations in the brain(^ Here, we briefly explain a possible scheme for encoding continuous 
vectors with spiking neurons that takes advantage of precise spike timing and the mathematics of maximum 
entropy distributions. A more detailed examination of this model will appear in a future work. 

Consider a network of n neurons in one region of the brain which transmit a continuous vector 9 £ M" 
using sequences of spikes to a second receiver region. We assume that this second region contains a number of 
coincidence detectors that measure the absolute difference in spike times between pairs of neurons projecting 
from the first region. We imagine three scenarios for how information can be obtained by these detectors. 
In the first, the detector is only measuring for synchrony between spikes; that is, either the detector assigns 
a to a nonzero timing difFerence or a 1 to a coincidence of spikes. In another scenario, timing differences 
between projecting neurons can assume an infinite but countable number of possible values. Finally, in the 
third situation, we allow these differences to take on any nonnegative real value. We further assume that 
neuronal output and thus spike times are stochastic variables. A basic question now arises: How can the first 
region encode 6 so that it can be recovered robustly by the second? 

We answer this question by first asking the one symmetric to this: How can the second region recover 
a real vector transmitted by an unknown sender region from spike timing measurements? We propose the 
following solution to this problem. Fix one of the detector mechanics as described above, and set to be the 
measurement of the absolute timing difference between spikes from projecting neurons i and j. We assume 
that the receiver population can compute the (local) sums di ~ "^j^i^-ij efficiently. The values a = {aij) 
represent a weighted graph G on n vertices, and we assume that is randomly drawn from a distribution 
on timing measurements {Aij). Making no further assumptions, a principle of Jaynes |18j suggests that the 
second region propose that the timing differences are drawn from the (unique) distribution over weighted 
graphs with the most entropy [331 HH] having the vector d = (di, . . . , d„) for the expectations ]E[X]j/i ^y] 
of the degree sums ^j^i Aij . Depending on which of the three scenarios described above is true for the 
coincidence detector, this prescription produces one of three different maximum entropy distributions. 

Consider the third scenario above (the other cases are also subsumed by our results). As we shall see in 
Section [3.21 the distribution determined in this case is parameterized by a real vector 6 = (^i, . . . ,0„); and 
the maximum likelihood estimator (MLE) for these parameters using d as sufficient statistics boils down to 
solving the following set of n algebraic equations in the n unknowns 0i , . . . , 0„ : 



Given our motivation, we call the system of equations ([TJ the retina equations for theoretical neuroscience, 
and note that they have been studied in a more general context by Sanyal, Sturmfels, and Vinzant |31j using 
matroid theory and algebraic geometry. Somewhat remarkably, a solution to ([!} has the property that it is 
arbitrarily close to the original parameters 9 for sufficiently large network sizes n (in the scenario of binary 
measurements, this is a result of [B]). In particular, it is possible for the receiver region to recover reliably a 
continuous vector 6 from a single cycle of neuronal firing emanating from the sender region. 

We now know how to answer our first question: The sender region should arrange spike timing differences 
to come from a maximum entropy distribution. We remark that this conclusion is consistent with modern 
paradigms in artificial intelligence, such as the concept of the Boltzmann machine [2], which is a stochastic 
version of its (zero-temperature) deterministic limit, the Little- Hopfield network [^ I16j . 

The organization of this paper is as follows. In Section[2l we lay out the basic theory of maximum entropy 
distributions on graphs. Section [3] is devoted to specializing the theory to three common weight sets and 
contains an extension of the Erdos-Renyi graphic sequence criterion for weighted graphs. In Section [31 we 
prove the consistency of the MLE from a single sample in the examples from Section [3] A key step in our 
proofs is a new inequality for the norm of the inverses of positive, symmetric diagonally dominant matrices 
from [15] . Finally, Appendix \K\ contains some facts about subexponential random variables. 

^Although it is well-known that precise spike timing is used for time-disparity computation in animals [7], such as when owls 
track prey with binocular hearing or when electric fish use electric fields around their bodies for locating objects. 




for i = 1, . . . , n. 



(1) 
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2 General theory via exponential family distributions 

In this section we develop the general machinery of maximum entropy distributions on graphs via the theory 
of exponential family distributions |41| . and in subsequent sections we analyze particular cases. Consider 
an undirected graph G on n > 2 vertices with edge having weight a.y G S, where S* C K is the set 

of possible weight values. We will later consider the specific cases S = {0, 1} (unweighted graphs), S* = Rq 
(weighted graphs with continuous weights), and 5 = No (weighted graphs with discrete weights). A graph G 
is fully specified by its adjacency matrix a = (ay )"^^]^, which is an ti x n symmetric matrix with zeros along 
its diagonal. A probability distribution over graphs G corresponds to a distribution over adjacency matrices 
a = {aij) e 5(2). Given a graph with adjacency matrix a = (a^), let degj(a) = '^j^i ^ij be the degree of 
vertex i, and let dcg(a) = (degj(a), . . . , deg„(a)) be the degree sequence of a. 

Let 5 be a ti-algcbra over the set of weight values S. Assume there is a canonical cr-finitc probability 
measure ly on {S, S) . Let ly be the product measure on Let Cp be the set of all probability distri- 

butions on that arc absolutely continuous with respect to ly Since v^'^) is cr-finite, these probability 
distributions can be characterized by their density functions, i.e. the Radon-Nikodym derivatives with respect 
to Given a sequence d = {di, . . . ,d„) S M", let *Pd be the set of distributions in *p whose expected 

degree sequence is equal to d, 

*Pd = {Pe*P:Ep[deg(A)]=d}, 

where in the definition above, the random variable A = {Aij) S 5(2) is drawn from the distribution P. Then 
the distribution P* in *Pd with maximum entropy is the exponential family distribution with the degree 
sequence as sufficient statistics [JTJ Chapter 3]. That is, the density of P* at a = (oy) G 5(2) is given bjH 

p*{a)^cxp{-0^dcg{a)-Z{0)), (2) 

where Z(9) is the log-partition function, 

Z{0) - log^(„) cxp ( - dcg(a)) i.(S)(da), 

and 9 = {9i, . . . , 0„) belongs to the natural parameter space 

e = {9eM.": Z{9) <oo}. 
Recalling that degj(a) — "Ylij^i ^ij^ can write 

n 

exp ( - 6*^ dcg(a)) = cxp - ^ ^ 6*^0^^^ = exp - ^i^i + = H ( ^ + 

Hence, we can express the log-partition function as 




in which Zi (t) is the marginal log-partition function 

Zi{t) = log / exp(— ia) iy{da). 
Js 

Consequently, the density in ^ can be written as 

p*(a) = [] exp ( - {9, + 9j)a,j - Zi{9, + 9,)), 

■^We choose to use —9 in the parameterization and not the canonical parameterization p*(a) cx: exp(0^ deg(a)), because 
it simplifies the notations in our later presentation. 
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from which we see that the edge weights Aij are independent random variables, with Aij G S having 
distribution P*^ with density 

p*^{a) = exp ( - {9, + 9,)a - Z^{B, + B,)) . 

In particular, the edge weights belong to the same exponential family distribution but with different 
parameters that depend on Oi and Bj (or rather, on their sum Oi + dj). The parameters 0i, . . . can be 
interpreted as the potential at each vertex that determines how strongly the vertices are connected to each 
other. Furthermore, we can write the natural parameter space as 

e = {61 e M" : Zi(6ii + 6ij) < oo for all i ^ j}. 

Going back to the characterization of P* as the maximum entropy distribution in ^Pd, the condition 
P* £ *Pd means that we choose the parameter Q such that Ep. [deg(j4)] = d. Equivalently, noting that 
—VZ(&) = Ep. [deg(^)], the solution to Z(&) = d is precisely the maximum likelihood estimator (MLE) 
of Q given an empirical degree sequence d G K". For instance, the vector d can be the average of the degree 
sequences of graphs Gi, . . . , Gm drawn i.i.d. from the distribution P*. We will later study the properties of 
the MLE of Q from a single sample G ~ P*. For now, we address the question regarding the existence and 
uniqueness of this MLE assuming the degree sequence d is given. 

Define the mean parameter space M to be the set of expected degree sequences from all distributions on 
5(2) that are absolutely continuous with respect to v^^): 

M = {Ep[deg(A)]: Pe 

The set Ai is necessarily convex, since a convex combination of probability distributions in *p is also a 
probability distribution in *p. Recall that an exponential family distribution is minimal if there is no linear 
combination of the sufficient statistics that is constant almost surely with respect to the base distribution. 
This is true for P*, for which the sufficient statistics are the degree sequence. We say that P* is regular 
if the natural parameter space Q is open. By the general theory of exponential family distributions [411 
Theorem 3.3], in a regular and minimal exponential family distribution, the gradient of the log-partition 
function maps the natural parameter space Q to the interior of the mean parameter space A^, and this 
mapping 

is bijective. Thus we have established the following. 

Proposition 2.1. Assume Q is open. Then there exists a solution 9 ^ Q to the MLE equation Ep. [deg(A)] = 
d if and only if d Cz M° , and if such a solution exists then it is unique. 

It remains to characterize the mean parameter space A4. We say that a sequence d = (di, . . . ,dn) is 
graphic if d is the degree sequence of a graph G with edge weights in 5, and in this case we say that G 
realizes d. It is important to note that whether a sequence d is graphic depends on the weight set S, which 
we fix for now. Let W be the set of all graphic sequences, and let conv(yV) be the convex hull of W. Clearly 
we have Ai C conv(>V), because any element of M is of the form Ep[deg(A)] for some distribution P and 
deg{A) G W for every realization of the random variable A. On the other hand, suppose *P contains the 
Dirac delta measures 6b for each B g S'^^). Given d S W, let B be the adjacency matrix of the graph that 
realizes d. Then d = E^-^ [deg(A)] <E Ai, which means W C tW, and hence conv(yV) C A4 since Ai is convex. 
Thus in this case we have Ai = conv(W). However, in general *P might not contain Dirac measures, and we 
need to look at the specific structure of ^ to decide whether conv(W) C Ai. 

We emphasize the distinction between a valid solution € Q and a general solution 9 e M" to the 
MLE equation Ep»[deg(A)] = d. As we saw from Proposition 12.11 we have a precise characterization of the 
existence and uniqueness of the valid solution 6 Cz Q, but in general, there are multiple solutions 9 to the 

^The presence of the minus sign in the mapping — is due to our choice of the parameterization in J^J using —6. 
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MLE equation. In this paper we shall be concerned only with the valid solution; Sanyal, Sturmfels, and 
Vinzant study some algebraic properties of general solutions |31| . 

We close this section by discussing the symmetry of the valid solution to the MLE equation. Let 
Dom(Zi) ^ {t G M.: Zi{t) < oo}, and let ^: Dom(Zi) ^ R be the mean function 



^(t) = a exp ( — ia — Zi(t)) vida). 
Js 

Observing that we can write 

Ep- [A^J] = / a exp ( - (0, + - + 9,)) u{da) = 

Js 

the MLE equation Ep* [deg(A)] = d then becomes 



di=Y^i{9^ + 9j) for i = (3) 

In the statement below, sgn denotes the sign function: sgn(t) = t/\t\ if < 7^ 0, and sgn(O) ~ 0. 

Proposition 2.2. Let d G Ai° , and let 9 G Q he the unique solution to the system of equations Q. If ^ is 
strictly increasing, then 

sgn{di — dj) ~ sgn(9i — 9j) for all i ^ j, 
and similarly, if fi is strictly decreasing, then 



sgn{di — dj) ~ sgn{9j — 6i) for all i 7^ j. 



Proof. Given i ^ j, 



E 



{^ii9,+9k)-^I{9,+9k)). 



If /i is strictly increasing, then fi{9i +9k) — fi{9j + 9^) has the same sign as 9i — 9j for each k ^ i,j, and thus 
di — dj also has the same sign as 9i — 9j. Similarly, fi is strictly decreasing, then fi{9i + 9k) — fJ-{9j + 9k) has 
the opposite sign of 9i — 9j , and thus di — dj also has the opposite sign of 9i — 9j. □ 

3 Specific edge weight cases 

We now consider specific choices of the weight set S. In each case we investigate the distribution of the edge 
weights Aij, the natural parameter space Q, and characterize the mean parameter space Ai. 

3.1 Unweighted graphs 

Let S = {0, 1} and ly be the counting measure so that we consider simple unweighted graphs. In this case, 

Zi{t) = log (1 + cxp(-t)) < 00 for all t e M, 

so Dom(Zi) = M and the natural parameter space is 8 = M", which is open. The edge weights A^j are then 
independent Bernoulli random variables with 

r(A, = i)- ^M-o^-03) 1 



1 + exp(-6'i -9j) 1 + cxp( 
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This model has been studied recently by Chatterjee, Diaconis, and Sly in the context of graph limits. 
When 9i = 02 = ■ ■ ■ = On ^ t, we recover the classical Erdos-Rcnyi model with edge emission probability 
p= l/(l + exp(2t)). 

Since ly^'^'^ is the counting measure on {0, l}^^), all distributions on {O,!}^^) are absolutely continuous 
with respect to i^^^), so *P contains all probability distributions on {0, l}^^). In particular, ^ contains the 
Dirac measures, and hence by the discussion in the preceding section we have Ai = conv(yV), where W is the 
set of all graphic sequences. Thus, it now remains to characterize when a given sequence d G Nq is a degree 
sequence of an unweighted graph, which is precisely what a classical result of Erdos and Gallai tells us. 

Lemma 3.1 (Erdos-Gallai |13)). A sequence (c?i, . . . e Ng is graphic if and only even and 

k n 
^c?i<fc(fc— 1)+ ^ min{di,fc} for k ~ 1, . . . ,n. (4) 

1=1 i=k+l 

The mean function is given by = 1/(1 + exp(t)), which is strictly decreasing for all t G Dom(Zi) = M. 
Given d £ M", the MLE equation Ep. [deg(A)] = d becomes the system of equations 

di = ■ for i = 1, . . . , n, (5) 

1 + exp{e, + 9,) ' ' ' ^ ' 

and we want to find a solution S 8 = M". We know that there exists a unique solution g R" if and only 
if d e conv(yV)°. However, given the characterization of W by the Erdos-Gallai criterion, it is unclear how 
to decide whether a given d belongs to conv(yV)°. Nevertheless, in practice we can circumvent this issue by 
employing the following iterative algorithm proposed in [8] to find the MLE solution; thus, given a sequence 
d in practice, we can run this algorithm and see whether it converges. Note that clearly if d e M" has di <Q 
for some i = 1, . . . , n then d </ conv(yV)°, so we focus on the case d g R" . 

Proposition 3.2 (S\ Theorem 1.5]). Given d (di,...,d„) G R!;., define the function ip: R" R" hy 
ip(x) = (i^i(x), . . . , <y5,i(x)), where 



(Pi(x) = - logd, + log V — r-^ -, r for i = 1, 



exp(x,) + exp(-a;i) 

Given any xq G R", let x^+i = i/3(xfe) for k G No. Suppose d G conv{yV)° and let 9 G R" be the unique 
solution to Then there exists a constant C that depends on ||xo||oo and ||0||oo, such that 

||xfc+2 - 6i||oo < C||xfc - 6i||oo for all k G Nq. 

In particular, this means (xfc) converges exponentially fast to the MLE solution 9. On the other hand, if 
d ^ conv(yV)° then (xk) has a divergent subsequence. 

We summarize the discussion in this section in the following corollary. 

Corollary 3.3. Given d = (o?i, . . . ,dn) G R", the system of equations 



5l + exp(0.+0,) ^""^'^^^ 



. , exp(6', 

has a solution 9 G R" if and only if d (z conv(yV)° . Furthermore, if such a solution exists then it is unique 
and is the fixed point of the iterative algorithm in Proposition \3.2l and it has the property that 

sgn{di — dj) — sgn{9j — 9i) for all i ^ j. 
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3.2 Weighted graphs with continuous weights 

Now let 5 = Mq find v be the Lebesgue measure on Mqi so we are considering weighted graphs with continuous 
weights. In this case the marginal log-partition function is 



Z\{t) = log / exp(— ta) da = < 

"'Ko \ 



log(l/t) ift>0 
oo if t < 0. 



Thus Dom(Zi) = and the natural parameter space is 

e = {(01, . . . , Br.) e : 0, + 0, > for i ^ .7}, 
which is open. For e 0, the edge weights Aij are independent exponential random variables with density 

V{{a) = {Bi + Bj) cxp {-(Bi + Bj) a) for a e Mq 

and mean parameter Ep* [Aij] ~ l/{Bi + Bj). Thus, the MLE equation Ep* [deg(A)] = d becomes the system 
of equations ([l} from the introduction, and we want to find a solution B Cz Q. 

The system ([1]) is a special case of a general class that Sanyal, Sturmfels, and Vinzant [31] study using 
algebraic geometry and matroid theory (extending work of Proudfoot and Speyer |28|). Define 



k=0 ^ ^ 



in which |^'} is the Stirling number of the second kind and {x)^^^-^ — x{x — 2) • • • (x — 2k) is a generalized 
falling factorial. Then, there is a polynomial H{d) in the di such that for d e M" with H{d) 7^ 0, the number 
of solutions 61 e M" to H]) is (-l)"x(0)- Moreover, the polynomial H{d) has degree 2(-l)"(nx(0) + x'(0)) 
and characterizes those d for which the equations above have multiple roots. We refer to [3T] for more details. 

Next, we characterize the set of graphic sequences W and determine its relation to the mean parameter 
space M. Recall that we say d = (di, . . . , d„) is a (weighted) graphic sequence if there is a graph G with 
edge weights in Rq that realizes d. In the case of unweighted graphs we have the combinatorial constraint 
that there is at most one edge between any pair of vertices, which translates into a set of constraints in the 
Erdos-Gallai criterion (j?]). In the case of weighted graphs, on the other hand, every edge can have as much 
weight as possible, so intuitively we would expect that the criterion for a weighted graphic sequence is simpler 
than the Erdos-Gallai criterion. This is indeed the case, as the following result shows. 

Lemma 3.4. A sequence (di, . . . , dn) € Kq is graphic if and only if 

1 " 

maxdi<-^d,. (6) 

l<i<n Z ^ — ^ 
i=l 

Proof. Clearly if (di, . . . , dn) S Rq is a graphic sequence then so is (d^rji), . . . , d^(„)), for any permutation tt of 
{1, . . . , 77,}. Thus without loss of generality we can assume di > d2 > ■ ■ • > d,i, and in this case condition (|6|) 
reduces to 

n 

di<^d,. (7) 

4=2 

It is easy to see that if (di, . . . , d„) G Rg graphic then ([7]) is satisfied, since the total weight that could 
come out of vertex 1 is at most X]"=2 '^i- converse direction, we first note the following easy properties 

of weighted graphic sequences: 

(i) The sequence (c, c, . . . , c) G Rq is graphic for any c G Rq. For n = 2 this sequence is realized by the 
graph on 2 vertices having edge weight c, and for n > 3 this sequence is realized by the "chain graph" 
with weights j+i ~ c/2 for i = 1, . . . , n and = otherwise. 
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(ii) If d = {di, . . . ,dn) € Rq satisfies ([7]) with an equality then d is graphic, reaUzed by the "star graph" 
with weights aij = dj for j = 2, . . . ,n and aij = otherwise. 

(iii) If d = (di, . . . , d,n) G M™ is graphic for some m < n then so is d = (di, . . . , d^, 0, . . . , 0) € Rg . This 
follows since we can obtain a graph that realizes d by inserting n — m isolated vertices to the graph 
that realizes d. 

(iv) If d'^) and d'^' are graphic then so is d*^^) + d'^^ This is because if Gi and G2 are graphs that realize 
d^^) and d^^\ respectively, then d*^^) + d'^^ is realized by the graph G whose edge weights are the sum 
of the corresponding edge weights in Gi and G2 ■ 

Now we prove the converse direction by induction on n. For the base case n = 2, condition ([7]) gives us 
di < d2 < di, so (di, £^2) is graphic by property (P. Assume that the claim holds for n — 1, and we will prove 
it also holds for n. So suppose we have a sequence d = (di, . . . , dn) G Rq satisfying ([7]), and let 



\i=2 J 

If K =- then ([7]) is satisfied with an equality, which implies d is graphic by property (|ii]). So assume if > 0. 
We consider two possibilities. 

1. Suppose K > dn- Then we can write d = d*^^' + d^^^ where 

d(i) = (di - d„, d2 - d„, . . . , d„_i - d„, 0) G Rq 

and 

d(2) = (d„,d„,...,d„) gRJJ. 

The assumption K > d„ implies di — d„ < X]"=2^('^* ~ '^")' ("^i ~ d"-' "^2 ~ dn, ■ ■ ■ , d„_i — d„) G Rq^^ 
is a graphic sequence by induction hypothesis. This implies d^^^ is also graphic by property (jml). Fur- 
thermore, d^^) is graphic by property (P, so d = d*^^' +d'^^' is also a graphic sequence by property (jlv|). 

2. Suppose K < dn- Then write d = d'-^^ + d'"*), where 

d(3) ^ (di -K, d2~K, dn -K)eR^ 

and 

dW ^ {K,K,...,K) e Rq. 

By construction, d^'^^ satisfies di — K = X]"=2('^* ~ ^''^)' ^° '^^'^^ ^ graphic sequence by property dn])- 
Since d^*) is also graphic by property Q, we conclude that d = d^'^) + d*^**^ is graphic by property (|iv)) . 

This completes the induction step and finishes the proof. □ 

Observe that condition (|6]) is implied by the case k = 1 in the Erdos-Gallai criterion (|4]). This means any 
sequence that satisfies the Erdos-Gallai criterion also satisfies condition which is to be expected, since 
any unweighted graph is also a weighted graph, so unweighted graphic sequences are also weighted graphic 
sequences. 

Given the criterion above, we can now write the set W of graphic sequences explicitly as 



1 " 

W= |(di,...,d„) gRq: max d, < - V d,}. 



We have the following simple property. 

Proposition 3.5. The set W is convex, and Ai = W. 
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Proof. We first prove that W is convex. Given d = (o?i, . . . , (i„) and d' = {d[, . . . , d'^^) in W, and given 
< ^ < 1; we note that 

max (tdi + (f — t)d'A < t max di + {1 — t) max 

l<i<n l<i<n l<i<n 



1=1 1=1 

-E(td, + (f-tK), 



2 

i=l 

which means id + (f - <)d' e W. 

Next, recall that we already have Ai C conv(yV) = W, so to conclude = W it remains to show that 
W C A^. Given d € W, let G be a graph that realizes d and let w = (wy ) be the edge weights of G, so 

that di — Tlij^i '^ij for alH = 1, . . . , n. Consider a distribution P on Rq^'' that sets each edge weight Aij to 
be an independent exponential random variable with mean parameter Wij. That is, the density p of P on 

a = (a„) G Rq"'' is 

p(a) = n — ^'^p ( -— 

J- J- 7;i_- ■ \ II): ■ 



Then clearly Ep[^,;j] = ti'y, and thus 



Ep[deg,(A)] = E Ep[Ay] = E Wij =d, for i = 1, . . . , n. 

This shows that d e Al, as desired. □ 

Noting that the mean function = 1/t is strictly decreasing on Dom(Zi) = R+, we reach the following 
conclusion. 

Corollary 3.6. Given d = {di, . . . , dn) G R", the system of MLE equations 

= H g. /or i = 1, . . . , n (8) 

j^i ' 

/las a valid solution 9 ~ {9i, . . . , 9n) (z Q if and only if 



n 

d e X° = {K, . . . , e R'^ : rnax^d', < 3 E ^''}- 



Furthermore, if d ^ M° , then the valid solution 9 Cz O is unique, and it has the property that 

sgn{di — dj) ~ sgn{9j — 9i) for all i ^ j. 

Example 3.7. Let n = 3 and d = (rfi, ^2, ^3) £ R" with di > d2 > d^. In this case it is easy to see that the 
system of equations ([5]) gives us 

^^^-^ = ^(^1+^2-^3), 0^ + 0^ ^ ^('^i -d2 + d3), and 0^\, 0^ = ^(-^1 + d2 + d^), 

from which we obtain a unique solution 9 = {9i,92, ^3). Recall that 9 £ Q means 9i + 92 > 0, 9i + 9^ > 0, 
and 02 + ^3 > 0, so the equation above tells us that 9 G Q ii and only if di < d2 + d^. In particular, this also 
implies d^ > di — d2 > 0, so d E R'^. Hence there is a unique solution 9 E Q ii and only if d G A4°, as stated 
in Corollary [ 



9 



3.3 Weighted graphs with discrete weights 

Finally, let = Nq and v be the counting measure, so we are considering weighted graphs with discrete (and 
unbounded) weights. In this case 



Zi(t) =log^exp(-ta) 



a=0 



log (l - exp(-t)) if i > 

D if i < 0. 



Thus Dom(Zi) = (0, oo), and the natural parameter space is 

e = {(01, . . . , 0„) e M" ■.e, + ej>Q for i ^ j}, 

which is open. Given 6* G 8, the edge weights Aij are independent geometric random variables with proba- 
bility mass function 

V*{Aij = a) = (1 - cxp(-6l, - 6j)) cxp ( - (0, + 9^) a) for a G Nq. 
The mean parameters are now 



1 - cxp( -61, - 6 J ) exp(6l, + 61^ ) - 1 ' 
so the MLE equation Ep* [deg(j4)] ~ d now becomes the system of equations 

1 



exp((?,+e,)-l 



for i = 1, 



and we want to find a solution G 0. 

Since z^^^) jg the counting measure on Nq^'', *P contains all the Dirac measures, so we have M ~ conv(yV) 
from the general discussion in Section O We now characterize W, the set of all graphic sequences. As in 
Lemma l3.4[ we also have a simple characterization for when d = (di, . . . , (i„) is a degree sequence of a graph 
G with edge weights in Nq . The proof of the following result is inspired by l9| . 

Lemma 3.8. A sequence (di, . . . , dn) G Nq is graphic if and only if J27=i ^-i is even and 

1 " 

max d, < — > dj. 

l<i<n 2 ^ 

i=i 

Proof. Without loss of generality we may assume di > d2 >■■■> dn, so the inequality above becomes 
di < X]"=2^i- ^^(^ necessary part is easy: if (di, . . . , (i„) is a degree sequence of a graph G with edge 
weights aij G No, then Yl7=i^i ~ j) '^ij even, and the total weight coming out of vertex 1 is at 

most X]r=2^'- "^^^ converse direction is trivial if n = 2, so assume n > 3. We proceed by induction on 
s = J27=i '^i- statement is clearly true for s = and s ~ 2. Assume the statement is true for some 

even s G N, and suppose we are given d = (di, . . . , dn) G Ng with di > ■ ■ ■ > dn, J27=i = « + 2, and 
c?i < X]i=2 ^i- Without loss of generality we may assume c?n > 1, for otherwise we can proceed with only the 
nonzero elements of d. Let k be the smallest index such that dk > dk+i, with k = n — \ li di = ■ ■ ■ — dn, 
and let d' = (di, . . . , dk-i, dk — 1, d^+i, . . . , d„ — 1). We will show that d' is graphic. This will imply that d 
is graphic, because if G" is a graph with edge weights aj^- that realizes d'. then d is realized by the graph G 
with edge weights afe„ = a'j,„ + 1 and a^ — o!^^ otherwise. 

Now for d' = (d'l, . . . , d^J given above, we have d'^ > • • • > dj^ and Y^=\ — d^ — 2 = s is even. So 

it suffices to show that d!-^ < X]"=2 ^i' ^'^^ then we can apply the induction hypothesis to conclude that d' is 
graphic. If fc = 1, then d'j^ = di — 1 < J2i=2 di — ^ = X]"=2 ^'i- If fc > 1 then di = d2, so di < J27=2 since 
dn > 1. In particular, since X]r=i ^-i is even, X]r=2 ^i^di = X]r=i di~2di is also even, hence X]r=2 di^di > 2. 
Therefore, d'l = di < Y!h=2 di — 2 = Y^7=2 d'i- This finishes the proof of the lemma. □ 
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The criterion above allows us to write an explicit form for W, 



W = . . . , d„) e Nq : ^ di is even and '^^ — 2 ^ ^*}' 

Now we need to characterize conv(yV). Let Wi denote the set of all graphic sequences from Section [3?2l when 
the edge weights are in Rq, 

Wi = |(di,...,d„) e WS: max d^ < T;YdA. 

1=1 

It turns out that when wc take the convex hull of W, we essentially recover Wi- 



Lemma 3.9. conv{W) = Wi. 



Proof. Clearly W C Wi , so conv( W) C Wi since Wi is closed and convex, by Proposition l3.5l Conversely, let 
Q denote the set of rational numbers. We will first show that WiRQ" C conv(yV) and then proceed by a limit 
argument. Let d G Wi n Q", so d = (di, . . . , d„) G Q" with di > and maxi<i<„ di < ^ ^7=i ^i- Choose 
K & N large enough such that Kdi G No for aU z = 1, . . . , n. Observe that 2Kd = {2Kdi, 2Kdn) G NJJ 
has the property that J27=i'^^'^i ^ even and maxi<i<„ 2iir(ii < ■^J^^^i'^^'^i^ ^'^ 2Kd G W by 

definition. Since = (0, . . . , 0) G W as well, all elements along the segment joining and 2Kd lie in 
conv(W), s o in parti cular, d = {2Kd)/(2K) G conv(W). This shows that Wi n Q" C conv(>V), and hence 
Wi n Q" C conv(W). 



To finish the proof it remains to show that Wi n Q" = Wi . On the one hand we have 



Wi n Q" c Wi n = Wi n = Wi. 

For the other direction, given d G Wi, choose di, . . . , dji G Wi such that d, di, . . . , d„ are in general position, 
so that the convex hull C of d, di, . . . , d„ is full dimensional. This can be done, for instance, by noting that 
the following n + 1 points in Wi are in general position: 

0, ei + 62, ei + 63, • • • , ei + e„, ei + 62 + ■ ■ ■ + e„ 

where ei, . . . , e„ are the standard basis of R". For each m G N and i = 1, . . . , n, choose d^™^ on the line 
segment between d and d^ such that the convex hull Cm of d, d^™-*, . . . , d^™-' is full dimensional and has 
diameter at most 1/m. Since Cm is full dimensional wc can choose a rational point r,„ G Cm ^ C* C W^. 
Thus we have constructed a sequence of rational points (r„i) in Wi converging to d, which shows that 

Wi c Wi n Q". □ 

Remark 3.10. Since W is countable and discrete, i.e. ||d — d'j|oo > 1 for d, d' G W with d 7^ d', it seems 
that conv(yV) is closed, so we in fact have conv(yV) = Wi- However, since we are only interested in the 
interior of conv(yV), knowing conv(W) = Wi is enough for our purposes. 

Recalling that a convex set and its closure have the same interior points, the result above gives us 
M° = conv(W)° ^ (conv(W))° = = {(di, . . . , d„) G R!^ : max d, < d,}. 

4 = 1 

Furthermore, noting that the mean function fi{t) = l/(cxp(t) — 1) is strictly decreasing for t G Dom(Zi) = 
(0, 00), we conclude the following. 

Corollary 3.11. Given d ~ (o?i, . . . ,dn) G R", the system of MLE equations 

di = y2 ,a ^ a \ T for i = I, . . . ,n (9) 

^ cxp(6'^ + ) - 1 
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has a valid solution 9 ~ {9i, . . . , On) (z if and only if 

deM° = {{d[,..., d'„) e Rl : max d', < - ^ d'X 

L l<i<n Z ^ — ^ J 
1=1 

Furthermore, i/ d G M° , then the valid solution 9 Cz O is unique, and it has the property that 

sgn{di — dj) = sgn{9j — Oi) for all i ^ j. 

Example 3.12. Let n = 3 and d = {di,d2,dj,) e R" with di > d2 > d^. One can easily check that the 
system of equations © gives us 



6*1 + 6*2 = log \l 

01 + 6I3 = log 

02 + 03^ log (^1 

from which we can obtain 9 = {9i, 02, ^3)- For 9 to be in 8 we want 6*1+^2 > 0, 91+9^ > 0, and ^2+^3 > 0, 
which means 2/{—di + c?2 + d^) > 0, or equivalently, di < d2 + ^3. This also implies ^3 > di — ^2 ^ 0, so 
d e Ri^. Thus 6* e e if and only if d e Al", as stated in Corollary EHH 



di + 


^2 - C?3 / 




' ) 


di - 


d2 + d3j 




2 


-di 


+ ^2 + ^3 



4 Existence and consistency of the MLE from one sample 

In this section we study the existence and consistency of the MLE of the parameters from one graph sample. 
Given 9 G Q, let A ~ (Aij) be a sample drawn from the random graph distribution parameterized by 9. Let 
d be the empirical degree sequence associated with A. As we saw in Section [51 the MLE € O of 6' is the 
solution to the system of equations ([3]). The MLE 9 is consistent if 6 converges in probability to as ?i — > 00. 

In the case of unweighted graphs, Chatterjee, Diaconis, and Sly |8] showed that with high probability the 
MLE 9 exists and is consistent. 

Theorem 4.1 ([SJ Theorem 1.3]). In the unweighted graph case, let M = Halloo- Then with probability at 
least 1 — C (M) jr? the MLE 9 exists and satisfies 



\\0-9\\^<C{M) 

where C{M) is a constant depending on M . 

For the case of weighted graphs, recall that 9 E Q means 9i +9j > for all i 7^ j. Our goal in this section 
is to prove the following consistency results. 

Theorem 4.2. In the case of weighted graphs with continuous weights, let L ~ miui^j 9i + 9j and M = 
maxi^j 9i + 9j, so < L < M. Let k > 1 be fixed. Then for sufficiently large n, with probability at least 
1 — 3ri^'^'^^^^ the MLE 9 exists and satisfies 
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Theorem 4.3. In the case of weighted graphs with discrete weights, let L = mini^j 6i + 6j and M = 
maxj^j 6i + 9j , so < L < M. Let k > I be fixed. Then for sufficiently large n, with probability at least 
1 - c(M)" - 'in-^''-^'> the MLE 9 exists and satisfies 



^ (exp(5Af ) - ly I 12 klogn 
" - cxp(5A/) Y cxp(L/2) - 1 V n ' 

where < c{M) < 1 is a constant that depends on M . 

4. 1 Proof of Theorem 1472] 

We are now working with weighted graphs with continuous weights. RecaU that in this case the edge weights 
Aij are exponential random variables with mean fi{9i + 9j) = l/(^i + 9.j). 

For the existence of the MLE 9, recall from Proposition l2.1l that 9 exists if and only if the empirical degree 
sequence d belongs to the interior of the mean parameter space M.° . Since d necessarily belongs to A^, the 
MLE 9 does not exist precisely when d falls on the boundary dM. = Ai \A4°. From Corollary 13.61 we see 
that this boundary is given by 



dM = < d' e KJJ : rf', = for some i or 



1 " 1 

i=i ) 



which has Lebcsgue measure 0. Since the distribution on A is continuous and d is a continuous function of 
A, we have d G dAi with probability 0. Thus, in this case the MLE 9 exists almost surely. 

Now for the consistency of 9, recall from Section [2] that the MLE 9 satisfies the equation —WZ{9) ~ d. 
Let d ~ ~V Z{9) denote the expected degree sequence of the maximum entropy distribution with parameter 
9. By the mean value theorem for vector- valued functions |201 p. 341], we can write 

d-d = VZ{9) -VZ{9) = j(9~9). (10) 

Here J is a matrix obtained by integrating (element-wise) the Hessian of Z on intermediate points between 
9 and 9: ^ 

J = V^Z{t9 + (1 - t)9) dt. 
Jo 

At any point ^ ^ t9 + {1 - t)9, the gradient VZ{() is 



^^ + 



(vz(e))^ = -^/i(e. + e,) = -E7:^ 

Therefore, the Hessian is given by 

(V^^(0),= (^-^ and (V^^IO),, = E (^"^ = E (^'^(^)).,- 

We call a nonnegative matrix such as V^Z diagonally balanced if each diagonal entry is equal to the sum of 
the other entries in its row. Since 9,9' ^ Q and we assume 9i + 9j < M, it follows that for i j, 

< e» + < max{a, + 9,, 9, + 9,} < max{M, 2||^||oo} <M + 2\\9\\^. 

Thus, the off-diagonal entries of V^Z(^) are bounded below by l/(Af + 2||^^j|oo)^, and so J is a symmetric and 
diagonally balanced matrix with off-diagonal entries bounded below by 1/(M + 2||0||oo)^, being an average 
of such matrices. By the main result of [15) . J is invertiblc and its inverse satisfies 

2(71 — lj(n — 2) n 
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where the last inequahty holds for sufficiently large n. Inverting J in (|10p and applying the bound above 
gives 



< \\J-^\\oo ||d - dii,, < - (M + 2\\e\\^f i|d - d| 



(11) 



Since Aij is an exponential random variable with rate X = 9i + Oj > L, Lemma I A. 2 1 tells us that Ay — 
1/(6*^ + 6*^) is a L/2)-subexponential random variable. Moreover, since {Aij,i ^ j) are independent, we 

can apply the concentration inequality for subexponential random variables |14] . Given any constant /c > 1, 
for sufficiently large n and for each i — 1, . . . ,n, we have 



\d^-d,\ > 



9kn\ogn 



<P \d,-d,\ > 



(n - 1) log(n - 1) 



y, -r 



> 



/ 8fclog(n- 1 ) 
L2(n- 1) 



< 2 exp - 



L^{n-1) 8fclog(n- 1) 
"8 L2(n- 1) 



and so by the union bound. 



Id-d|l^. > 



9fcnlogn 



i2 



2 ^3 
(n — 1)*-' ~ 



\d,-dA > 



9kn log n 



L2 



for i = 1 , . . . , n < 



Assume now that ||d — d||oo < y'yfcnlogri/L^, which happens with probability at least 1 — 3n ^\ 
Then from and using the triangle inequality, we get 



6 k log n 



{M + 2\\e\\^)^ + M. 



(12) 



What we have shown is that \\0\\oo satisfies the inequality Gn (||^||oo) > 0, where G„(x) is the quadratic 
function 

G„ (x) = -I J'i^ (M + 2xf -x + M. 

It is easy to check that for sufficiently large n we have Gn{2M) < and Gn(logn) < 0. Thus, G„(||6'||oo) > 
means either 116*1100 < 2M or ||6'j|oo > logn. We claim that for sufficiently large n, 116*1100 < 2M. Suppose the 
contrary that ||6'||oo > logn. Since Oi + 6j > for each i ^ j, there can be at most one index i with 6i < 0. 
We consider two cases: 



1. Case 1: Oi > for all i = l,...,n. Let i* be an index with 
+ej > 9^- for 



> log n. Then, since 



1 1^1 

17 - n- 1 ^ 6*,. + 6'w 





< 



n - 1 
1 



|d-d||,oo + ^=^ 



0, 



< 



S\/kn logn 1 



i(n — 1) log 71 ' 
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which cannot liold for sufBciently large n, as the right hand side on the last line tends to 0. 

2. Case 2: 6i < for some i = 1, . . . Without loss of generality assume 6i < < 92 < ■ ■ ■ < dn- 
Following the same chain of inequalities as in the previous case, we obtain 



^ < ^-||d-d||oo-, 

Al n — 1 n — 



7n -t- 171 



j=2 "J ' 



^ 3\/knlogn ^ 1 n — 2 



L{n-l) (n-l)(0„ + 0i) {n-l)\\e\l 
^ 2)^/ knlogn ^ 1 ^1 



L{n-l) {n-l){6n + ei) logn' 
so for sufficiently large n, 



1 ^ /^ _3Vfcnlogn _^\ ^ n 

— ^ ' \ HT Tl„ ^\ !„„„ / — 



di+On' \M L{n - 1) logn; " 2M' 

and thus 0i + 0i < Oi + 9n < 2M/n for each i = 2, . . . , n. However, then 



3\/knlogn ,, , ^, 



n 1 

j=2 ■' j=2 "^1 ^ '^n 



(n-1) n(n - 1) _ (n - l)(n - 2) 
- M ^ 2M ^ 217 ' 

which cannot hold for sufficiently large n. 

The analysis above shows that ||0||oo < 2M for all sufficiently large n. Finally, from ((TT|) we conclude that 
for sufficiently large n, with probability at least 1 — 3n~*^'^~^^ we have 



2 ,„,,2 3V^-nlogn _ 150Af2 /fclogn 



||^-^IU<-(5Af) ^ ^ y ^ 

as desired. 

4.2 Proof of Theorem liTSl 

We are now working with weighted graphs with discrete weights. Recall that in this case the edge weights 
Aij are geometric random variables with mean ^.{Oi + 6j) = l/(exp(0i + 9j) — 1). 

As in the proof of Theorem 14.21 the MLE 6 docs not exist if and only if the empirical degree sequence d 
falls on the boundary DM., which by Corollary 13. Ill is given by 

dM = < d' € R[J : rf' = for some i or max d' = i V d', \ . 

1 l<i<n ' 2^ M 

Using union bound and the fact that the edge weights Aij are independent, we have 

n n 

f{di ^ for some «) < ^ ^(^^^ 0) E ^^^^i " ^ ^ 

n 

= E n (1 - '^^p(-^' - ^^)) " (1 - cxp(-A/))"-^ . 

i=l j^^i 
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Furthermore, again by union bound, 

P I max di = ^'^di \ = P I (ij = ^ for some i \ < ^ P | = ^ 



Note that we have di = '^j^i dj for some i if and only if the edge weights Ajk = for all k ^ i. This 
occurs with the probability 



' (A.k = for .7, k^i)= U (1 - cxp{-e, - Ok)) = (1 - cxp(-A/)) 



Therefore, 



i=i / 



P(d e DM) < F{d, = for some i) + P max d, 

< n (1 - exp(-M))""^ + n (1 - exp(-Af ))("^') 

< c(M)" 

for sufficiently large n, where < c(M) < 1 is a constant depending on M. This shows that the MLE 9 
exists with probability at least 1 — c{My'' for sufficiently large n. 

Now assume that the MLE exists. The proof of the consistency of 6 follows the same outline as in 
the proof of Theorem 14.21 Let d = —VZ{9) denote the expected degree sequence of the distribution with 
parameter 0. By the mean value theorem, we can write 

A-d = vz{e)-vz{e)^ j{e-e), (13) 

where J = V^Z{te + (1 - 0) dt. In this case, at any point ^ = t6' + (1 - 1)6) the gradient VZ{9) is 



cxp(e, + - 1' 



and the Hessian is given by 



"0 (exp(Cj - 1)^ frt (°^P(^» ^ 1) 77^ 

Moreover, since 9,9 E Q and we assume 9i + 9j < M, for i 7^ j we have 

0<^i+ < max{6l,: + 9^, 9i + 9^} < max{A/, 2||^||oo} <M + 2\\9\\^. 

Therefore, J is a symmetric and diagonally balanced matrix with off-diagonal entries bounded below by 
exp(M + 2||?||oo)/(exp(M + 2||^||oo) - 1)^- By inverting J in ([Ta]) and applying the bound on from [15], 
we obtain 

II. - 9U < lU-lloo lid - d||. < i^MM + 2mU l)^ _ g 

" cxp(A/ + 2||0||,3o) 
Since Aij is a geometric random variable with emission probability 

p = 1 — cxp{—9i — 9j) > 1 — exp(— L), 
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Lemma rA.3l tells us that Ay — l/(exp(6'j + 6'^) — 1) is (cr^ , (i)-subexponential with 



= , ^^3 =^2 ^ rr77y{ 7 and rf= -ilog(l-p) > -ilogcxp(-L) = 

{l-y/T^Y exp(L/2)-l 2 2 2 

Furthermore, since the edge weights {Aij,i ^ j) are independent, we can apply the concentration inequality 
for subexponential random variables [2]. Given any constant fc > 1, for sufficiently large n and for each 
i = 1 , . . . , n we have 



- V exp(L/2) - 1 j - ^'-y exp(L/2)-l 

1 



'■^ exp( 



> 



2fclog(n - 1) 



< 2 exp - 



(exp(i/2) - l)(n - 1) 2fclog(ri-l) 



(cxp(L/2)-l)(n-l) 



< 



2 (exp(i/2) - 1) 

3 



and so by the union bound, 



, , / Sfcnlogn \ _ / ,T , , / 3fc7ilog7i ^ \ 3 

|d-d|U> J ^ =P |d,-rf,;|> J , , ^ forz = l,...,n < 



exp(L/2)-l/ I' ' - V exp(L/2) - 1 - n^-^ 



Assume now that ||d — dj|oo < ■\/3fcri log n/(exp(L/2) — 1), which happens with probability at least 
1 - in-'~^-^\ Then from ^ and using the triangle inequality, we get 



ll^lloo < 11^ - ^lloo + ll^lloo < J , ^^f^"^" , (exp(M + 211^11 )-l)^ 

" " " V^(cxp(i/2)-l) exp(Af + 2||0||o,) ^ ^ 

This niGciiis ll^lloo ScLtisfics th-G iiiGQURlity _/^yj^(||^||oo) ^ 0, wlicrc Ujil^x^ is th.G fimction 



/ 12fclogn (exp(Af + 2x) - 1)^ 
"^^ " Y n(exp(i/2) - 1) exp(M + 2x) + 

Note that i!f„ is a convex function, so it assumes the value at most twice. Moreover, it is easy to check 
that for all sufficiently large n, we have Hn{2M) < and Hn{\\ogn) < 0. Therefore, i?Ti(||fi'||oo) > implies 
either ||0||oo < 2A/ or 116*1100 > j^ogn. We claim that for all sufficiently large n, \\9\\oo < 2A/. Suppose the 
contrary that \\0\\oo > j \ogn. Since 9i + 9j > for each i ^ j, there can be at most one index i with 9i < 0. 
We consider two cases: 

1. Case 1: 9i > for all i = Let i* be an index with 6*^. = \\9\\oo > ^logn. Then, since 
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9,,+e,>e,, for jV**, 

I <^y 

exp(M) - 1 - 71 - 1 exp(6l,. + 9^) - 1 



1 

< 



n-l 



£ exp(0,. + e,) 1 cxp(^.. + e,) - 1 " - 1 



< lid - dlloo + 

""-1 cxp(||(?||oo)-l 



1 / 3fcri log n 1 

< 



(n - 1) y cxp(L/2) - 1 ?ii/4_i' 
which cannot hold for sufBciently large n, as the right hand side on the last line tends to 0. 

2. Case 2: 6i < for some i = 1, . . . ,n. Without loss of generality assume 6i < < 02 < ■ ■ ■ < On. 
Following the same chain of inequalities as in the previous case, we obtain 

< I d-d OC + ^ + V 

n — 1 \ /n I n \ i ^ — ' 



exp(Af)-l 71-1 \exp(6i„ + 6ii)-l cxp(6ij + 6'„) - 1 



1 / 3kn log 71 1 71 — 2 



(71 - 1) y exp(L/2) - 1 (7i-l)(cxp(a„ + 0i)-l) (7i-l)(cxp(i|0||oo)-l) 

1 / 3fc71 log 71 1 1 

< 



(71- 1) Y exp(L/2) - 1 (71- l)(cxp(^„ + - 1) 7ii/^-l' 
so for sufficiently large 7i, 



exp(0i + ?„) - 1 " \^exp(A/) - 1 (n - 1) y cxp(L/2) - 1 7ii/4 - 1^ " 2(exp(A/) - 1) ' 

Therefore, for i = 2, . . . , 7i we also have 



1 1 

> ^ ^ > 



exp(6ii +0i)-l exp(6li + 9,,) - 1 2(exp(M) - 1) 
However, this implies 



/ 3fc?ilog?i ^ g ^ 1 ^ 1 

exp(L/2) - 1 - " fr'^ exp(0i + 0,) 'l^j^^ exp(^i + - 1 



> 



{n-l) 7i(7i-l) _(n-l)(n-2) 



exp(A/) - 1 2(cxp(A/) - 1) 2(exp(A/) - 1) ' 

which cannot hold for sufficiently large 7i. 

The analysis above shows that ||0||oo < 2 A/ for all sufficiently large 7i. Finally, using and taking 
into account the existence of the MLE, we conclude that for sufficiently large 7i, with probability at least 
1 - c(Af)" - Sn-e^-i) ^Yic MLE Sexists and satisfies 



\\g_0u ^ 2 (exp(5AJ) - 1)^ / 3fcnlogn _ (cxp(5Af ) - 1)^ / 12 /fclog7i 
" " - n exp(5A'/) y exp(L/2) - 1 ~ exp(5A'/) y exp(L/2) - 1 

as desired. 
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A Subexponential random variables 

Recall that a zero-mean random variable X is subexponential if E[exp(tX)] < exp((T^t^/2) for all \t\ < d, 
for some parameters (cr^jd). In particular, a (cr^, (i)-subexponential random variable X is also ((T^,(i_)- 
subexponential for any ct^ > and d- < d. Subexponential random variables satisfy the following concen- 
tration inequality |14] . 

Theorem A.l. Let Xi, . . . , X„ be independent (cr^, d) -subexponential random variables. Then 



1 " 
n ^-^ 

i=l 



>t] < 2Q„{t) 



where 

fexp {~nt^/2a^) if < t < da^ 
' ~ [exp{~dnt/2) ift>da^. 

Lemma A. 2. If W ^ Exponential{l / X) , then Z ~W — 1/A is {^j}? ,\ 12) -subexponential. 

Proof. Recah that E[cxp(W)] = A/(A - t) for t < A. Consider \t\ < A/2, so -1/2 < t/X < 1/2. By Taylor 
expansion, 

/ A t _ I ^2 ^ _ 2t^ 

" A j ^ A ~ " A2 2(1 - 0^ " " A2 2(1 - 1/2)2 - 
where ^ is some number between and t/X. This shows that 

^["-P(^^)]- exp(t/AHl-VA) -'^"p($) ^-^111^1^^' 
which means Z is (4/A2, A/2)-subexponential. □ 
Lemma A. 3. IfW^ Geometric{p), then Z — W — (1 — p)/p is (a^ , d)- subexponential with 



(1-yr^) 

Proof. Recall that 



~ ^ ^ and d = — - log(l — p). 



By Taylor expansion. 



E[exp(W)] = ' ^ for t < - log(l - p). 

1 — (1 — p)e^ 



log (1 - (1 - p)e^) ^ logp -['-^)t-'; ^\:'^''... 



P J 2 (i_(i_p)e«)^ 
where ^ is some number between and t. Note that for |t| < — log(l — p)/2 we have 



(l-(l-p)ee)^ - (1-VT^)^ 
so 

This shows that 



□ 
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